首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 177 毫秒
1.
共享内存多线程编程是挖掘多核处理器并行性的重要方法,然而,共享内存的多线程程序在运行时存在不确定性,线程间的内存竞争是导致不确定性的主要来源。内存竞争信息量大,记录时带来的开销大,实现内存竞争记录是确定性重演共享内存多线程程序的关键。分别概括了现有软件实现的内存竞争记录机制和硬件实现的内存竞争记录机制,并对内存竞争记录的研究现状进行了总结,指出了当前内存竞争记录技术面临的挑战。  相似文献   

2.
改进的SPIHT算法   总被引:3,自引:0,他引:3  
SPIHT算法是一种高效的嵌入式的零树编码算法,然而,它需要大量的内存空间,不利于DSP或VLSI的实现。LZC算法可以极大地降低编解码器的内存需求,但同时也降低了编解码器的性能。该文利用LZC算法的思想,改进了原来的SPIHT算法,使得在仅仅在LZC算法的内存需求基础上,达到SPIHT算法的性能要求。同时又提出了一种近似搜索算法来提高编码器的速度。  相似文献   

3.
为有效解决高速数据的采集、处理和事后重演,设计了一种大存储容量数据记录重演设备。该记录重演设备利用FPGA实现PCI总线与计算机的接口设计,利用Visual C++6.0开发环境进行人机交互的控制及重演演示软件设计。该设备可记录多种数据类型,在实际应用中灵活方便。  相似文献   

4.
提出了一种新型的分段环形FDL(fiber delay line)以解决OBS(optical burst switching)网络中信道竞争问题,并根据分段环形FDL的特点提出了一种新的L_FF数据信道调度算法。计算机仿真结果证明了分段环形FDL比普通环形FDL的性能更好,而且L_FF调度算法比已有的FF调度算法更适合于分段环形FDL。  相似文献   

5.
郭改文  黄卡玛 《电子学报》2008,36(9):1839-1843
 利用自然树生长过程中生长、凋落矛盾统一的原理,建立了自然树生长的竞争模型,提出了模拟自然树生长的竞争算法.为验证算法的合理性和有效性,将其应用于复杂曲线的拟合,与标准遗传算法进行对比,该算法具有运行速度快、内存占用率低、拟合精确度高.与经典的最小二乘法进行对比,该算法内存占有率低且具有抗噪特性.该算法为优化设计和计算提供了一种新的思路.  相似文献   

6.
重复累积(RA)码译码使用置信传播(BP)算法,具有接近香农限的性能,但校验节点更新时使用复杂的双曲正切函数和反双曲正切函数,算法复杂度较高。为了降低译码算法的复杂度,且具有较好的性能,该文将查表法和分段函数近似法结合起来,提出了一种改进的译码算法。该算法采用分段的一次函数,去近似BP算法检验节点变换化简后的原函数,采用非常小的查找表得到一个校正值,用校正值去修正近似函数和变换化简后的原函数之间的误差,很好地避免了复杂函数的计算,且误差极小。该算法大大降低了译码算法的复杂度,且具有接近BP算法的译码性能。  相似文献   

7.
利用Walsh-Hadamard变换可实现2元域含错方程组的求解,该方法可用于卷积码的盲识别,但当方程组未知数较多时,其对计算机内存的要求使得该方法在实际中难以应用,为此该文提出一种基于分段Walsh-Hadamard变换的卷积码识别方法。该方法通过对方程组高维系数向量进行分段,使其转化为两个低维的系数向量,将Walsh-Hadamard变换求解高维方程组的问题分解为求解两个较低维数方程组的问题,同时证明了两个低维方程组解向量的组合就是高维方程组的解。算法有效减少了对计算机内存的需求,仿真结果验证了该算法的有效性,且算法具有良好的误码适应能力。  相似文献   

8.
本文介绍一种简化的E面纵向金属膜片的算法。该方法结合模匹配法与广义散射矩阵连接技术确定膜片的多模散射特性,然后推导出一种实用的近似公式,为E面电路的计算机辅助设计提供了一条减少内存和运算量的途径。  相似文献   

9.
针对现有穿墙雷达三维稀疏成像中,存在网格时延构建字典矩阵所需内存过大以及凸优化稀疏成像算法阈值参数不确定影响重建图像质量的问题,提出了一种基于衍射层析稀疏模型的学习近似消息传递三维成像方法。该方法在衍射层析成像算法上通过构造快速傅里叶变换算子来建立三维成像稀疏模型,然后修正近似消息传递算法求解稀疏解,并将其迭代过程映射成多层神经网络,最后通过数据驱动自适应学习多层神经网络中的可调参数,从而实现三维学习成像。仿真和实验数据处理结果表明,该方法不仅减小了系统所需内存,还避免了参数的人工调整对成像质量的影响。  相似文献   

10.
翟锐  吕科  代双凤  潘卫国 《电子学报》2016,44(12):2894-2899
随着遥感技术的发展,地形数据规模越来越大,远远超过了内存处理的范围,成为急需解决的问题.通过数据压缩提高系统吞吐量是常用技术之一,随着GPU技术的快速发展,传统的压缩算法无法充分利用GPU的能力.鉴于此,本文提出了一种基于GPU的地形数据压缩方法,实现了高度域和位置信息的压缩.不同于其他的算法仅对高度或位置进行压缩,本文的主要贡献在于将地形的位置和高度同时进行处理,当前顶点的所有信息都可以根据当前分段计算得到.算法对地形的高度域进行贝塞尔曲线的近似,保存每个顶点的差值,实现有损和无损的相结合的高比率的压缩.通过与传统方法的比较,实验结果表明,能够取得很好的压缩效果.  相似文献   

11.
Data race is a major factor which causes multi-core programs to produce concurrent bugs.To address the high hardware cost in happens-before detection proposals,a light-weight hardware data race detection approach based on sliding window technology was proposed.It used sliding windows to save recent memory instructions in thread execution and dynamically detected data races with small race distance which more easily lead to concurrent bugs.Considering the race distance,parallel thread segments were subdivided into concurrent race regions with lock and concurrent race regions without lock.A pair of alternate rewritable sliding windows was used to store the memory instructions in concurrent race region without lock,and a sliding window with variable size was used to store the memory instructions in concurrent race region with lock.When there was a conflict between a remote sharing access and memory accesses in sliding windows,a data race was detected.In the hardware implementation,the addresses of the data in sliding windows were automatically encoded into three hardware signatures with small size.Data races can be detected quickly without modifying the L1 cache and cache coherence protocol messages.This approach supplies efficient guidance to help users to diagnose concurrency bugs occurred in the development and production run of multi-core programs,achieving smaller hardware and bandwidth overhead.  相似文献   

12.
基于LBS的车辆监控系统实现及关键技术研究   总被引:1,自引:0,他引:1  
本文介绍了一种基于LBS的车辆监控系统,该系统采用了"点到点"路径匹配算法和"轨迹插值"轨迹回放算法,可有效提高定位精度和降低应用成本.  相似文献   

13.
为了测定NAND Flash 图像记录系统的稳定性以及峰值记录速度指标,减少人工测试量,设计了压力测试系统。针对稳定性测试问题,设计了基于指数回归的速度压力模型和基于对数正态分布的测试时长控制模型;针对峰值记录速度测定问题,提出了基于爬山搜索算法和速率二分法的软硬件协同测试方法。基于有效数据占空比机制设计速率软件可调的硬件数据产生器,用爬山算法粗略确定峰值记录速度区间,再用速率二分法逼近峰值记录速度;系统测试报告通过串口和千兆网输出至上位机显示。实验结果表明:测试系统速度压力调整精度可达0.1MB/s;速度压力范围为0~1 600MB/s;回读数据硬件检验无时钟延迟;被测NAND Flash 记录系统挂载8 片SLC NAND Flash 芯片的峰值记录速度为240.12MB/s,在200MB/s 速度压力下,可以连续工作24 h 以上。测试系统架构为通用化设计,可以对其他传输和记录系统进行压力测试。  相似文献   

14.
史方显  曾立  陈昱  王淼  占丰 《电子学报》2017,45(2):446-451
提出了一种新的选择迭代式高速高精度CORDIC(COrdinate Rotation Digital Computer)算法.基于表驱动法缩小目标旋转角度,通过改进的基本角度选择方法旁路不必要的迭代;并以移位和减法实现幅度校正,减小硬件资源消耗.设定角度误差小于10-5rad时,迭代次数减小至7次以下.在DDFS(Direct Digital Frequency Synthesizer)的应用中,利用区间压缩技术在Xilinx的FPGA中实现20位定点小数电路设计.仿真及实测结果表明,该算法幅度误差小于2×10-5,输出延时不大于43.5ns,同时硬件资源消耗不增加.  相似文献   

15.
Internet protocol (IP) address lookup is one of the major performance bottlenecks in high-end routers. This paper presents an architecture for an IP address lookup engine based on programmable finite-state machines (FSMs). The IP address lookup problem can be translated into the implementation of a large FSM. Our hardware engine is then used to implement this FSM using a structured approach, in which the large FSM is broken down into a set of smaller FSMs which are then mapped into reconfigurable hardware blocks. The design of our hardware engine is based on a regular and well structured architecture, which is easy to scale. Our simulation results demonstrate that the FSM based architecture can easily scale to wire speed performance at OC-192 rates. Unlike previous approaches, the performance of our architecture is not constrained by memory bandwidth and is, therefore, in principle scalable with very large scale integration technology.  相似文献   

16.
Based on an algorithm derived from the new Chinese remainder theorem I, we present three new residue-to-binary converters for the residue number system (2n-1, 2n, 2n+1) designed using 2n-bit or n-bit adders with improvements on speed, area, or dynamic range compared with various previous converters. The 2n-bit adder based converter is faster and requires about half the hardware required by previous methods. For n-bit adder-based implementations, one new converter is twice as fast as the previous method using a similar amount of hardware, whereas another new converter achieves improvement in either speed, area, or dynamic range compared with previous converters  相似文献   

17.
A new approach is proposed to reduce the memory requirements of the multilevel fast multipole algorithm (MLFMA) when applied to the higher order Galerkin's method. This approach represents higher order basis functions by a set of point sources such that a matrix-vector multiply is equivalent to calculating the fields at a number of points from given current sources at these points. The MLFMA is then applied to calculate the point-to-point interactions. This permits the use of more levels in MLFMA than applying MLFMA to basis-to-basis interactions directly and, thus, reduces the memory requirements significantly.  相似文献   

18.
嵌入式操作系统μC/OS-Ⅱ的一种内存管理算法   总被引:1,自引:1,他引:0  
针对μc/OS-Ⅱ内存管理机制的不足,提出了一种新的内存管理算法.较小的内存分成固定大小的内存块,用位图索引组织;较大的内存用链表组织.实验表明,该方法能较好地提高内存分配速度和利用率,特别是对于内存块大小变化很大的系统.  相似文献   

19.
基于USB和MCU新型任意波形发生器设计   总被引:1,自引:0,他引:1  
介绍一款基于USB芯片CH375,MCU芯片ADuC842和高速DDS芯片AD9850为核心的波形发生器,重点介绍了该仪器的总体结构,波形发生的硬件电路以及与CH375的接口设计。该仪器实现了特殊波形数据的存储与回放,常用函数波形的产生和对波形频率、相位、幅值的灵活调整。  相似文献   

20.
Low-Cost Fast VLSI Algorithm for Discrete Fourier Transform   总被引:1,自引:0,他引:1  
A primeN-length discrete Fourier transform (DFT) can be reformulated into a (N-1)-length complex cyclic convolution and then implemented by systolic array or distributed arithmetic. In this paper, a recently proposed hardware efficient fast cyclic convolution algorithm is combined with the symmetry properties of DFT to get a new hardware efficient fast algorithm for small-length DFT, and then WFTA is used to control the increase of the hardware cost when the transform length Nis large. Compared with previously proposed low-cost DFT and FFT algorithms with computation complexity of O(logN), the new algorithm can save 30% to 50% multipliers on average and improve the average processing speed by a factor of 2, when DFT length Nvaries from 20 to 2040. Compared with previous prime-length DFT design, the proposed design can save large amount of hardware cost with the same processing speed when the transform length is long. Furthermore, the proposed design has much more choices for different applicable DFT transform lengths and the processing speed can be flexible and balanced with the hardware cost  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号