首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
In the paper, an efficient parallel implementation of Edmonds' algorithm is suggested for finding optimum graph branching on an abstract model of the SIMD type with vertical data processing (STAR machine). For this, associative parallel algorithms for finding critical circuit and its contraction, as well as for unfolding embedded critical circuits, are constructed for directed weighted graphs represented as a list of arcs and their weights. It is shown that the execution of Edmonds' algorithm on a STAR machine requires O(nlogn) time, where nis the number of graph vertexes. Basic advantages of the parallel implementation of Edmonds' algorithm compared to its implementation on sequential computers are discussed.  相似文献   

2.
Lea  R.M. 《Computer》1975,8(11):25-32
Research into new computer structures, which would be better suited to non-numerical information processing tasks, has been in progress for some years.1,2,3The problem has been to design a computing system with high hardware efficiency and low software complexity over a wide range of these applications. Recent research4,5,6has indicated that these apparently conflicting requirements could possibly be achieved by a parallel processing system containing content-addressable storage. Hence there is a revival of interest in associative memories and associative processors.  相似文献   

3.
网络安全协议的处理过程通常需要进行大量加解密运算,会对网络传输带宽产生较大影响。文章研究了基于多核处理器的网络安全协议并行处理技术,提出了针对IPSec和SSL协议的并行处理模型,这些模型可以有效改进协议处理效率,提高网络的整体性能。  相似文献   

4.
Three IC technologies result in different outcomes-performance and cost-in two case studies. The author compares their designs in terms of silicon area, substrate size, and power consumption  相似文献   

5.
Cohler  E.U. Storer  J.E. 《Computer》1981,14(9):28-36
Based on the natural division of mathematical problems, functional parallelism becomes the architectural key for improving speed/cost ratios for array processors.  相似文献   

6.
7.
高岚  王锐  钱德沛 《软件学报》2013,24(6):1390-1402
多核处理器并行程序的确定性重放是实现并行程序调试的有效手段,对并行编程有重要意义。但由于多核架构下存在共享访存不同步问题,并行程序确定性重放的研究依然面临多方面的挑战,给并行程序的调试带来很大困难,严重影响了多核架构下并行程序的普及和发展。分析了多核处理器造成并行程序确定性重放难以实现的关键因素,总结了确定性重放的评价指标,综述了近年来学术界对并行程序确定性重放的研究。根据总结的评价指标,从纯软件方式和硬件支持方式对目前的确定性重放方法进行了分析与对比,并在此基础上对多核架构下并行程序的确定性重放未来的研究趋势和应用前景进行了展望。  相似文献   

8.
Due to the increasing power consumption in modern computing systems, energy management has become an important research area in the last decade. Recently, multicore has emerged to be an energy efficient architecture that exploits parallelisms in modern applications. However, as the number of cores on a single chip continues to increase, it has been a grand challenge on how to effectively manage the energy efficiency of multicore-based systems. In this paper, based on the voltage island and dynamic voltage and frequency scaling (DVFS) techniques, we investigate the energy efficiency of block-partitioned multicore processors, where cores are grouped into blocks with the cores on one block sharing a DVFSenabled power supply. Depending on the number of cores on each block, we study both symmetric and asymmetric block configurations. We develop a system-level power model (which can support various power management techniques) and derive both block- and system-wide energy-efficient frequencies for systems with block-partitioned multicore processors. Based on the power model, we prove that, for embarrassingly parallel applications, having all cores on a single block can achieve the same energy savings as that of the individual block configuration (where each core forms a single block and has its own power supply). However, for applications with limited degrees of parallelism, we show the superiority of the buddy-asymmetric block configuration, where the number of required blocks (and power supplies) is logarithmically related to the number of cores on the chip, in that it can achieve the same amount of energy savings as that of the individual block configuration. The energy efficiency of different block configurations is further evaluated through extensive simulations with both synthetic as well as a real life application.  相似文献   

9.
对不同规模,不同要求的问题进行并行计算,应选用不同数量的处理器参与计算才能获得最佳的并行处理性能。本文以连续系统并行仿真为例,提出适合于并行仿真系统的我处理器数确定的方法。  相似文献   

10.
11.
在神威高性能多核服务器上,自动并行化编译系统为识别和申明程序中的并行性,产生的OpenMP程序没有经过充分的优化,其采用简单的fork-join模型,存在大量的并行循环嵌套,导致运行效率低。为提升自动并行化编译系统产生的OpenMP程序的运行效率,提出一种并行域重构优化技术。并行域重构技术通过合并程序中的并行域和扩展嵌套循环中的并行域范围,减少OpenMP程序的并行域数目,降低线程组频繁创建和合并等控制开销,将简单fork-join模型的OpenMP程序转换为性能更为高效的单程序多数据模型的OpenMP程序。实验结果表明,在新一代神威高性能多核服务器SW1621平台上,并行域重构技术在NPB3.3-OMP测试集和SPEC OMP2012测试集上的运行效率分别提高了10.77%和7.94%的,可有效提升自动并行化编译系统OpenMP程序的执行效率。  相似文献   

12.
Earley's algorithm has been commonly used for the parsing of general context-free languages and the error-correcting parsing in syntactic pattern recognition. The time complexity for parsing is 0(n3). This paper presents a parallel Earley's recognition algorithm in terms of an ``X*' operator. By restricting the input context-free grammar to be ?-free, the parallel algorithm can be executed on a triangular-shape VLSI array. This array system has an efficient way of moving data to the right place at the right time. Simulation results show that this system can recognize a string with length n in 2n + 1 system time. We also present a parallel parse-extraction algorithm, a complete parsing algorithm, and an error-correcting recognition algorithm. The parallel complete parsing algorithm has been simulated on a processor array which is similar to the triangular VLSI array. For an input string of length n the processor array will give the correct right-parse at system time 2n + 1 if the string is accepted. The error-correcting recognition algorithm has also been simulated on a triangular VLSI array. This array recognizes an erroneous string of length n in time 2n + 1 and gives the correct error count. These parallel algorithms are especially useful for syntactic pattern recognition.  相似文献   

13.
We describe new architectures for the efficient computation of redundant manipulator kinematics (direct and inverse). By calculating the core of the problem in hardware, we can make full use of the redundancy by implementing more complex self-motion algorithms. A key component of our architecture is the calculation in the VLSI hardware of the Singular Value Decomposition of the manipulator Jacobian. Recent advances in VLSI have allowed the mapping of complex algorithms to hardware using systolic arrays with advanced computer arithmetic algorithms, such as the coordinate rotation (CORDIC) algorithms. We use CORDIC arithmetic in the novel design of our special-purpose VLSI array, which is used in computation of the Direct Kinematics Solution (DKS), the manipulator Jacobian, as well as the Jacobian Pseudoinverse. Application-specific (subtask-dependent) portions of the inverse kinematics are handled in parallel by a DSP processor which interfaces with the custom hardware and the host machine. The architecture and algorithm development is valid for general redundant manipulators and a wide range of processors currently available and under development commercially.  相似文献   

14.
PRAM和LARPBS模型上的近似串匹配并行算法   总被引:15,自引:1,他引:15  
钟诚  陈国良 《软件学报》2004,15(2):159-169
近似串匹配技术在网络信息搜索、数字图书馆、模式识别、文本挖掘、IP路由查找、网络入侵检测、生物信息学、音乐研究计算等领域具有广泛的应用.基于CREW-PRAM(parallel random access machine with concurrent read and exclusive write)模型,采用波前式并行推进的方法直接计算编辑距离矩阵D,设计了一个允许k-差别的近似串匹配动态规划并行算法,该算法使用(m+1)个处理器,时间复杂度为O(n),算法理论上达到线性加速;采取水平和斜向双并行计算编辑距离矩阵D的方法,设计了一个使用((m+1)个处理器和O(n/(+m)时间的、可伸缩的、允许k-差别的近似串匹配动态规划并行算法,.基于分治策略,通过灵活拆分总线和合并子总线动态重构光总线系统,并充分利用光总线的消息播送技术和并行计算前缀和的方法,实现了汉明距离的并行计算,设计了两个基于LARPBS(linear arrays with reconfigurable pipelined bus system)模型的通信高效、可扩放的允许k-误配的近似串匹配并行算法,其中一个算法使用n个处理器,时间为O(m);另一个为常数时间算法,使用mn个处理器.  相似文献   

15.
New load and store processing algorithms let memory-latency-tolerant architectures sustain thousands of in-flight instructions without scaling cycle-critical fully-associative load and store queues. These algorithms rely on redoing some stores after fetching cache miss data from memory (to fix memory dependences). Doing so provides better power and area characteristics than constantly enforcing memory dependences among a several loads and stores, many of which have unknown addresses.  相似文献   

16.
针对3GPP LTE标准中的Turbo码,设计了一种基于最大后验概率算法的低功耗并行译码器.根据二次置换多项式交织器的整数数学特性,分解并行处理中每个译码器的交织地址为子码块地址和块内偏移地址,提出一种高效的递归计算子码块交织地址的算法,使得并行度可以为任意值,而不仅仅限于2的幂次;并依此设计了低复杂度的实时递归计算交织器的互连结构,以避免传统实现方法中对交织地址的存储,有效地简化了Turbo译码器本征信息处理的互连网络,减小了实现面积和功耗;最后从结构级进行优化设计,进一步减少面积和功耗.实验结果表明,在40nm的工艺下,约束工作电压为1.18V、时钟频率为282MHz,版图实现可以达到130Mb/s的吞吐量,且功耗仅为107mW,每次迭代能量效率为0.107nJ/bit.  相似文献   

17.
18.
关联任务在多核处理器上并行调度所产生的通信时延,会对任务调度长度和处理器利用率造成负面影响,为了改善多核系统对关联任务的处理性能,针对关联任务在多核处理器上的调度特点,提出一种并行感知调度算法.计算各任务与终点间的最长路径值,按照该值的降序来分配任务调度次序,在分配处理器内核时兼顾关联度和任务最早可执行时间,设置最佳匹...  相似文献   

19.
一种独立任务的同型机调度快速算法   总被引:4,自引:0,他引:4  
如何将n个独立任务调度到m台同型机上加工,使总完成时间最短,是一个复杂问题.通过分析Bound Fit预备算法的性质,结合MULTIFIT和Bound Fit提出QUICKFIT算法;对相同机器数和任务数,QUICKFIT能用比MULTIFIT和Bound Fit都少的迭代次数得到相同的总完成时间.实验结果表明,任务机器比越大,QUICKFIT算法的性能就越优于MULTIFIT和Bound Fit.绝大多数情况下,总完成时间等于MULTIFIT和Bound Fit中的最小者.该算法适用于大规模同型机调度.  相似文献   

20.
In this work we explore three modifications to the architecture of a Data-Driven VLSI array of processors, previously introduced in Koren et al. (IEEE Computer21, 10 (Oct. 1988), 30-43). These modifications are geared toward improving the array utilization, as well as the performance of the mapped algorithms. The first modification considered increases the internal parallelism present in each array processing element, allowing it to simultaneously execute two instructions. A second modification improves the connectivity between processing elements by adding wires and switches between these elements. The third modification creates blocks of processing elements, featuring tighter coupling and faster communication, to take advantage of algorithm locality. Each change is evaluated by asserting its impact on the mapping of algorithms onto the modified array.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号