首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 35 毫秒
1.
张凌洁  赵英 《电子设计工程》2012,20(17):15-18,22
Floyd-Warshall算法是图论中APSP(All-Pair Shortest Paths)问题的经典算法,为了加快计算速度,提出使用GPU通用计算来实现。文章先从算法的原理入手,层层深入,提出了可以在GPU上运行的并行F-W算法。之后,又根据矩阵分块的原理和GPU共享存储器的使用,实现了改进的GPU并行F-W算法。通过大量测试实验,得到了该GPU并行程序相对于传统CPU并行程序产生超过百倍的加速比的结论。  相似文献   

2.
This paper describes a new aluminum pattern formation process using the substitution reaction of aluminum for polysilicon (APSP), and its application to the fabrication of self-aligned aluminum-gate MOSFET's. The APSP method uses the intensive interdiffusion reaction between aluminum and polysilicon observed for contact structure where the aluminum film overlaps polysilicon and is heat treated below the eutectic temperature (577°C). The basic idea in the fabrication of self-aligned aluminum-gate MOSFET's using APSP is to replace the polysilicon gate with an aluminum gate in the final step following fabrication of the self-aligned polysilicon-gate MOSFET. It is shown that the new fabrication process can be followed by almost all of the conventional polysilicon-gate processes. It is also shown that the electrical characteristics of the aluminum-gate MOSFET fabricated using APSP are nearly the same as those of polysilicon-gate MOSFET's fabricated on the same wafer.  相似文献   

3.
基于A*算法的多线程并行航迹规划方法研究   总被引:3,自引:2,他引:1  
并行计算是提高航迹规划速度的一种有效手段,同时A*算法具有隐含并行的特性,计算机多CPU多线程技术使得并行计算脱离了工作站或工作组计算机,可使A*算法在单机上进行并行计算。随后根据A*算法的特点对并行计算进行了相应改进,并将其应用到巡航导弹的航迹规划当中。仿真结果表明:改进的并行算法在没有改变航迹规划结果性能的同时,计算速度、稳定性都有了较大提高,有利于快速航迹规划。  相似文献   

4.
李智  刘源  闫斌 《通信技术》2015,48(4):441-446
在自组织网络语音通信中,针对音频传输中存在的延时、丢包等主要问题,在ZigBee网络路由的基础上建立层次分析法评价模型,设计了一种音频传输路由算法AHP-RP。通过分析路径链路质量、音频负载值、路径存活时间和路径长度等因素对音频质量的影响,构建以网络的4个因素为因子的比较矩阵,选择最优传输路径。仿真及实际通信平台验证表明,该算法能有效地适应网络状态,明显改善了语音通话质量。  相似文献   

5.
General purpose graphics processing units (GPGPUs) have gained much popularity in scientific computing to speedup computational intensive workloads. Resource allocation in terms of power and subcarriers assignment, in current wireless standards, is one of the challenging problems due to its high computational complexity requirement. The Hungarian algorithm (HA), which has been extensively applied to linear assignment problems (LAPs), has been seen to provide encouraging result in resource allocation for wireless communication systems. This paper presents a compute unified device architecture (CUDA) implementation of the HA on graphics processing unit (GPU) for this problem. HA has been implemented on a parallel architecture to solve the subcarrier assignment problem and maximize spectral efficiency. The proposed implementation is achieved by using the “Kuhn‐Munkres” algorithm with effective modifications, in order to fully exploit the capabilities of modern GPU devices. A cost matrix for maximum assignment has been defined leading to a low complexity matrix compression along with highly optimized CUDA reduction and parallel alternating path search process. All these optimizations lead to an efficient implementation with superior performance when compared with existing parallel implementations.  相似文献   

6.
Hu  Yanzhi  Zhang  Fengbin  Tian  Tian  Ma  Dawei  Shi  Zhiyong 《Wireless Networks》2022,28(3):1129-1145

Data mules are extensively used for data collection in wireless sensor networks (WSNs), which significantly reduces energy consumption at sensor nodes but increases the data delivery latency. In this paper, we focus on minimizing the length of the traveling path to reduce the data delivery latency. We first model the shortest path planning of a data mule as an optimization problem, and propose an optimal model and corresponding solving algorithm. The optimal model solution has high time complexity, mainly due to the parallel optimization of node visit arrangements and data access point (DAP) settings during the solution process, which is to obtain the shortest path result. In order to improve the computational efficiency, we next give the approximate model and its solving algorithm, which is mainly to decompose the path planning problem into the Traveling Salesman Problem (TSP) and nonlinear optimization problem, and optimize the two parts separately. The proposed approach is capable of expressing the influence of the communication range of each sensor node, which is suitable for more general application scenarios than the existing methods. Theoretical analysis and simulation results show that the solution has good performances in terms of path length and computational efforts.

  相似文献   

7.
Multiple inputs multiple outputs orthogonal frequency division multiplexing (MIMO-OFDM) technology is regarded as a promising solution to offer ultra-high data rate in wireless communications. This paper presents a field-programmable gate array (FPGA) implementation of an early-pruned K-Best detection algorithm applicable to ultra-high data throughput MIMO-OFDM communication systems. The algorithm simplifies the computation significantly compared to conventional K-Best algorithm with negligible bit error ratio (BER) degradation. A fully parallel structure is implemented on a FPGA platform, which achieves 1.9Gb/s detection throughput and is about three times over previous implementation. Moreover, a pre-processing method is realized to reduce the number of multipliers inside the detector and shrinks the critical path delay down to 8.32 ns. Together with candidate sharing and early-pruning architecture to further save the hardware cost, a high-speed, compact MIMO signal detector is demonstrated.  相似文献   

8.
高性能多核 DSP 的通信以及并行执行是多核系统设计的关键.文章分析了视频目标跟踪算法各模块的资源消耗,对各部分算法提出了并行计算的思路;提出改进的二值化掩膜法提取背景图像;提出辅助并行结构以使负载均衡;研究了 DSP多核通信的进程间通信(IPC)同步机制,运用流水线并行结构,实现三核同步并行处理系统.通过实验,测试了通信延迟时间,并把目标跟踪程序合理地划分到3个 DSP核中,实现并行处理,达到了实时性要求.  相似文献   

9.
随着SAR成像技术的不断发展,对SAR图像的成像精度和实时率的要求也愈来愈高,尤其是军事领域,高实时率是SAR成像系统的一个关键指标。该文提出了一种基于CS成像算法的中粒度SAR并行成像算法,该算法中每一个处理步骤均能并行完成,是一种任务级的并行,适合于具有较高通信性能的并行处理系统。在曙光3000上的实验证明,该算法具有较高的实时率和并行效率。  相似文献   

10.
将微粒群算法与并行计算模型相结合,基于三种不同的并行计算模型(带中央控制器的并行计算模型、环形结构带缓存区的并行计算模型、BSP并行计算模型),设计出相应的并行微粒群算法,并对并行算法性能进行详细分析。大量实验表明:子种群之间的通讯周期是个重要的可变参数,当选取合适时,能提高解的质量以及算法的收敛性和最优性。  相似文献   

11.
针对地空通信系统传输时延较高、吞吐量较低的问题,提出了一种拥塞感知负载分布(Congestion-aware Load Distribution,CALD)算法。算法通过通信路径质量的变化,估计对应路径的拥塞窗口变化情况,自适应地调整每次子流分配大小,降低端到端的延迟,以应对地空通信的低带宽和高时延特性。同时基于搭建的多路径端到端服务器进行了实验,结果表明,所提算法在吞吐量性能和端到端延迟等方面都优于现有的子流分配算法。  相似文献   

12.
沈治 《信息技术》2011,(11):103-106
蚁群算法是受自然界中蚁群搜索食物行为启发而提出的一种智能优化算法,通过介绍蚁群觅食过程中基于信息素的最短路径的搜索策略,来解决AGV小车寻优路径的问题,并通过仿真验证了这种算法可求得最简路径的效果,并通过AGV地址识别技术,阐述了AGV小车和计算机的通讯协议,以达到较好的控制效果。  相似文献   

13.
刘川  黄在朝  陶静  贾惠彬 《电信科学》2018,34(10):47-52
目前满足系统保护通信网络的路由算法是在综合时延和可靠性的要求下,计算一条快速且可靠传输的路径,而没有考虑到当电网发生故障或网络中通信量过大时,通信网络中许多节点的排队时延会极大地增加,忽略排队时延对路径传输总时延的影响会导致路径的错误选择,从而影响系统保护的实时性。针对此问题,提出了一种考虑排队时延的路由选择算法,用于计算信息传输总时延最小的最优路径,以提高系统保护通信的实时性。实验结果表明,通过本文提出的路由选择算法计算得到的路径在满足系统保护可靠性要求的基础上信息传输总时延最小。  相似文献   

14.
基于PCNN的迷宫最短路径求解算法   总被引:6,自引:0,他引:6  
本文根据脉冲耦合神经网络(PCNN)并行运行的特点,提出了基于PCNN模型的迷宫最短路径搜索算法。从理论上对该算法进行了分析和讨论,并给出了具体的算法和实验结果,验证了该方法的有效性。与其他算法相比,该方法可以在最短的时间内完成最短路径的搜索。  相似文献   

15.
为实现电磁计算的安全可靠和自主可控,该文基于“天河二号”国产众核超级计算机平台,开展大规模并行矩量法(MoM)的开发工作。为减轻大规模并行计算时计算机集群的通信压力以及加速矩量法积分方程求解,通过分析矩量法电场积分方程离散生成的矩阵具有对角占优特性,提出一种新型LU分解算法,即对角块矩阵选主元LU分解(BDPLU)算法,该算法减少了panel列分解的计算量,更重要的是,完全消除了选主元过程的MPI通信开销。利用BDPLU算法,并行矩量法突破了6×105 CPU核并行规模,这是目前在国产超级计算平台上实现的最大规模的并行矩量法计算,其矩阵求解并行效率可达51.95%。数值结果表明,并行矩量法可准确高效地在国产超级计算平台上解决大规模电磁问题。  相似文献   

16.
复杂目标的精确电磁特性分析往往需要巨大的存储和极长的计算时间。针对这一问题,结合国内发展迅速的超级计算机系统,研究了具有精确高效仿真能力的高性能电磁算法——高阶矩量法。提出了单元预选法来消除矩阵并行填充过程中的无效计算,加速矩阵填充过程。提出了一种具有更少的通信次数和通信量的新型并行LU分解算法,加速矩阵方程求解过程。数值测试表明提出的矩阵并行填充算法和矩阵方程并行求解算法在超级计算机平台上都能获得较高的并行性能,大幅提高了矩量法的仿真能力。  相似文献   

17.
In this paper, a unified algebraic transformation approach is presented for designing parallel recursive and adaptive digital filters and singular value decomposition (SVD) algorithms. The approach is based on the explorations of some algebraic properties of the target algorithms' representations. Several typical modern digital signal processing examples are presented to illustrate the applications of the technique. They include the cascaded orthogonal recursive digital filter, the Givens rotation-based adaptive inverse QR algorithm for channel equalization, and the QR decomposition-based SVD algorithms. All three examples exhibit similar throughput constraints. There exist long feedback loops in the algorithms' signal flow graph representation, and the critical path is proportional to the size of the problem. Applying the proposed algebraic transformation techniques, parallel architectures are obtained for all three examples. For cascade orthogonal recursive filter, retiming transformation and orthogonal matrix decompositions (or pseudo-commutativity) are applied to obtain parallel filter architectures with critical path of five Givens rotations. For adaptive inverse QR algorithm, the commutativity and associativity of the matrix multiplications are applied to obtain parallel architectures with critical path of either four Givens rotations or three Givens rotations plus two multiply-add operations, whichever turns out to be larger. For SVD algorithms, retiming and associativity of the matrix multiplications are applied to derive parallel architectures with critical path of eight Givens rotations. The critical paths of all parallel architectures are independent of the problem size as compared with being proportional to the problem size in the original sequential algorithms. Parallelism is achieved at the expense of slight increase (or the same for the SVD case) in the algorithms' computational complexity  相似文献   

18.
FDTD并行算法实现及其数据通信优化   总被引:1,自引:0,他引:1  
应用计算机局域网,采用基于消息传递(MPI)方式和区域分割技术,实现了FDTD的并行计算。以无限长线电流源在自由空间辐射为算例,对并行FDTD算法进行了验证。结果表明并行算法和串行算法计算结果一致,并有效提高了计算效率。最后通过减少通信数据量、优化数据交换方式及通信和计算重叠的方法,使并行算法的数据通信得到优化,并行计算效率明显提高。  相似文献   

19.
一种基于FPGA的Viterbi译码器优化算法   总被引:1,自引:1,他引:0  
Viterbi译码是卷积码的最佳译码算法,针对Viterbi译码器实现中资源消耗、译码速度、处理时延和结构等问题,通过对Viterbi译码算法及卷积码编码网格图特点的分析,提出一种在FPGA设计中,采用全并行结构、判决信息比特与路径信息向量同步存储以及路径度量最小量化的译码器优化实现方案。测试和试验结果表明,该方案与传统的译码算法相比,具有更高的速度、更低的时延和更简单的结构。  相似文献   

20.
赵太飞  冷昱欣  王玉 《激光技术》2017,41(5):728-733
为了研究直升机编队飞行在链路中断或节点中断情况下的路径恢复,基于紫外光非直视通信的路径损耗,采用Dijkstra算法寻找网络连通性的前提下直升机编队飞行通信网络的最优路径,通过节点移动来实现链路中断或节点中断时的路径恢复。通过理论以及仿真分析,得到了最优路径在不同链路断开时的路径恢复情况。结果表明,采用所提出的算法虽然在节点移动时需要花费2s~3s的时间,但是与路径重寻方法相比,3跳和4跳节点的链路收敛时间能够有效减小0.2ms和0.4ms,路径权值同样能够减小20dB和45dB,因此该算法具有可行性。这一结果对机群间路径快速恢复的研究有一定的应用价值。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号