首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 156 毫秒
1.
为了降低超长指令字(VLIW)架构的平均跳转开销和平均访存时延,并减少VLIW程序的代码体积,提出了一种全新的将分支预测与值预测技术应用于VLIW架构的方法。首先分析现有超标量(Superscalar)架构中动态预测技术与V L IW架构中指令静态并行之间所存在的矛盾;通过拓展原有跳转指令和读内存指令,使之与不同的延时槽个数相对应,并根据不同的指令来阻塞流水线或延时写回寄存器,从而解决动态预测技术造成V L IW架构静态调度周期错乱的问题。基于Gem5仿真平台和清华大学Magnolia VLIW数字信号处理器(DSP)的基准测试程序实验表明,该分支预测与值预测技术能显著地提高VLIW架构的性能,缩小VLIW程序的代码体积。  相似文献   

2.
该文提出归一化自适应预测矢量量化(NAPVQ)算法压缩SAR原始数据。NAPVQ算法先采用矢量线性预测器对输入矢量进行预测,再对原矢量与预测矢量之间的残差矢量进行矢量量化。该算法可视为差分脉冲调制在矢量量化中的拓展,其性能优于块自适应量化(BAVQ)算法以及归一化预测自适应量化(NPAQ)算法。对算法复杂度的进一步分析表明,NAPVQ算法能获得复杂度和性能之间比较合理的折衷,具有实用价值。  相似文献   

3.
针对传统低复杂度预测矢量量化图像编码算法预测准确性的不足,通过分析像素距离对相关性的影响,提出了数种改进的基于邻近像素的预测方案,并提出了一种具有边缘走向自适应性的预测方案.仿真实验表明,采用这些预测方案的预测矢量量化算法能够在保持低计算复杂度的同时,显著提高矢量预测准确度,改善图像编码性能.  相似文献   

4.
流水线是制造高性能CPU的关键技术,目前许多学者研究在FPGA上实现具有流水线结构MIPS CPU,但是在解决流水线冲突上只是通过简单的停顿流水线实现.描述一种较为通用的具有五级流水线的MIPS CPU结构以及其中可能发生的流水线冲突,在此基础上详细介绍解决流水线冲突的技术--数据旁路以及动态分支预测在MIPS CPU中的设计和实现,最后通过一段指令序列进行仿真验证,解决流水线冲突的技术减少指令执行所需要的时钟周期数.  相似文献   

5.
当今的处理器性能与存储器带宽和延迟严重失衡的问题限制了计算系统的整体性能,而存储器的性能对制程工艺不敏感,在后摩尔时代下很难再通过集成电路制造工艺的迭代获得处理器性能收益,因此人们更多地想通过体系结构的创新获得更高性能的计算系统.处理器值预测技术是一种能在无需改变存储系统情况下有效缓解存储墙问题的解决方案,其通过预测性地打破数据真相关进而让更多的指令可以在乱序处理器中并行执行,而无需等待由于访存等操作造成的长周期指令执行.近年来,值预测在各个方面都有了实质性的进步,但现如今还没有商用处理器使用这一技术,这主要是由于值预测技术的使用还面临许多挑战:现有的处理器的流水线架构不能直接使用值预测技术;值预测所需的预测值传递机制需要额外的硬件资源开销;值预测器巨大的存储开销让其很难在片上实现;由于值预测错误时的性能惩罚大,因此预测准确率较低的值预测器会降低处理器性能.针对这些问题,本文以值预测技术为中心,围绕值预测技术相关的流水线架构、值预测器结构和错误恢复机制三个方面分别详细论述了国内外研究成果以及其对于各个问题挑战的解决策略.最后,本文对当今的处理器值预测技术进行了总结并对未来的研究方向进行...  相似文献   

6.
《今日电子》1997,(5):63-66
应用实例 本部分将为您介绍采用MMX指令集来实现基本编码结构的一些实例。 条件选择 多媒体应用程序必须处理大量数据。有时需要根据对输入数据进行的条件查询来选择数据。通过采用微体系结构以提高性能及实现更深流水线,Intel已将其处理器家族的性能提高到新的阶段。由于误测可能会造成流水线溢出而降低性能,分支预测是使流水线保持高效运行的一项重要技术。下例介绍了一种减少使用分支指令(特别是那些与数据相关,因而很难  相似文献   

7.
基于小波变换的最小失真预测/多级矢量量化   总被引:1,自引:0,他引:1  
矢量量化器的压缩性能随维数的增大而提高,但复杂度亦随维数的增大呈指数增大,限制了大维数矢量的使用。本文利用小波变换产生的子带间的相关性,提出一种新的最小失真预测/多级矢量量化算法。一方面通过最小失真预测来降低时间复杂度,使得编码63D的矢量只需付出相当于15D矢量的时间复杂度代价;另一方面通过增强多级矢量量化算法来进一步降低复杂度。在复杂度得到极大降低的同时,仍具有很好的编码性能。  相似文献   

8.
分支预测是限制微处理器性能提高的一个重要因素,因此也一直是微处理器设计研究的重点.文中提出了一种对动态两级自适应分支预测进行改进的新方法,即基于双模结构的双模预测器.它能同时根据指令转移间相关和转移内相关进行转移预测,从而获得更高的转移预测精度.  相似文献   

9.
介绍了AMR—NB与G.729A2种语音编码标准的特点和算法,并就其线性预测分析、基音搜索、代数码书搜索和增益量化4个方面技术进行了比较。在线性预测分析方面主要对2种算法的加窗、LSP量化与内插的不同进行了陈述;在基音搜索方面对开环和闭环基音搜索上的差异进行了分析,并且对代数码书结构和搜索算法及增益参数量化的差别进行了阐述。最后给出了2种编码标准的语音质量、计算复杂度、空间复杂度等性能测试结果。  相似文献   

10.
DVB-S2标准低密度奇偶校验码(LDPC)译码器在深空通信中面临着低复杂度、高灵活性及普适性方面的迫切需求。通过对LDPC译码算法中量化结构的研究,提出一种动态自适应量化结构的设计方法。该方法在常规均匀硬件量化的基础上,提出了修正化Min-Sum译码算法中的数据信息初始化及迭代译码的动态自适应量化结构,解决了DVB-S2标准LDPC码译码时存在的校验节点运算与变量节点运算之间的复杂度不平衡的问题,并由此提高了译码器的译码性能。实验证明,以DVB-S2标准LDPC码中码长为16 200,码率为1/2的为例,提供动态自适应量化结构与常规的均匀量化结构相比,节省硬件资源为4%。此外,动态自适应量化结构支持动态可配置功能,保证了DVB-S2标准LDPC译码器的灵活性及普适性。  相似文献   

11.
超标量处理器的转移预测方案研究   总被引:1,自引:1,他引:0  
陈智勇 《微电子学与计算机》2006,23(11):118-120,125
随着高性能超标量处理器的流水线深度和发射度的增加,为挖掘宽发射、深度流水线处理器的潜在性能,设计一个杰出的转移预测器已变得越来越重要.常规的两级转移预测器是根据局部转移历史信息或全局转移历史信息来预测转移的结果,文章给出了一种新的转移预测方案,称为LGshare,它同时使用全局和局部转移历史信息来改进超标量微处理器的转移预测准确度.当模式历史表(PHT)的大小固定时,与常规的两级预测器相比,LGshare能获得更高的转移预测准确度.  相似文献   

12.
Coprocessor design is one application of high-level synthesis. We want to focus on high-performance coprocessors to speed up time critical parts in hardware-software codesign of embedded controllers. Time critical software parts often contain nested loops, often with data-dependent branches and data-dependent number of iterations. When (loop) pipelining is employed for high performance, the control dependencies become a dominant limitation to pipeline utilization. Branch prediction is a possible approach, but is usually restricted to few instructions and to one branch because of hardware and control overhead. Multiple branch prediction and speculative computation take a more global view on the program flow. We give practical examples of how speculative computation with multiple branch prediction increases performance far beyond a usual ASAP scheduling based on a CDFG. For scheduling, speculative computation requires a modification of the CDFG and, for the allocation phase, the insertion of register sets to save the processor status. The controller needs slight modification. We conclude that manual application of our approach will in general be too difficult, such that it can only be used in connection with synthesis  相似文献   

13.
An increasing cache latency in next-generation processors incurs profound performance impacts in spite of advanced out-of-order execution techniques. One way to circumvent this cache latency problem is to predict load values at the onset of pipeline execution by exploiting either the load value locality or the address correlation of stores and loads. In this paper, we describe a new load value speculation mechanism based on the program syntax correlation of stores and loads. We establish a symbolic cache (SC) , which is accessed in early pipeline stages to achieve a zero-cycle load. Instead of using memory addresses, the SC is accessed by the encoding bits of base register ID plus the displacement directly from the instruction code. Performance evaluations using SPEC95 and SPEC2000 integer programs on SimpleScalar simulation tools show that the SC achieves higher prediction accuracy in comparison with other load value speculation methods, especially when hardware resources are limited.  相似文献   

14.
There are abundant intra and inter prediction modes in the AVS video coding standard. Rate distortion optimized mode decision can fully utilize this flexibility to improve the spatio-temporal prediction efficiency and maximize the coding efficiency. However, the implementation complexity is dramatically high due to huge throughput burden. Hardware oriented mode decision algorithm is tailored for VLSI implementation in this work for high definition video coding. Mode preselection is employed to alleviate the dramatic throughout burden. Also, intelligent pipeline scheduling mechanism is proposed to break the intrinsic data dependency in intra prediction, which is directly related with mode decision. The proposed simplified algorithm is well-suited for hardware implementation with small performance penalty. Finally, the VLSI architecture is proposed with good trade off between circuit consumption and rate distortion performance.  相似文献   

15.
In modern systems, many well-known techniques (e.g., dynamic voltage and frequency scaling, job scheduling etc.) have been developed to achieve low power, high performance, appropriate quality-of-service or other specific purposes. Workload prediction is an extremely critical factor for bringing these techniques into full play. However, it is very difficult to accurately predict the workloads of upcoming tasks if they are varying drastically. In this paper, we propose a new hybrid fuzzy-Kalman filter and the corresponding area-efficient hardware architecture to accurately and quickly predict the workload with large variation. To decrease the hardware complexity while maintaining sufficient accuracy, the computation of Kalman Gain is simplified with a lookup table method. In addition, the workload and covariance values in Kalman filter are properly normalized and truncated to significantly reduce the bit length of hybrid workload predictor. Furthermore, a simplified fuzzy controller is developed to adaptively adjust the measurement noise covariance of Kalman filter so that the prediction error can be further lowered. Experimental results of real applications exhibit that the proposed hybrid fuzzy-Kalman filter can achieve lower prediction error and smaller hardware area when compared to previous workload predictors.  相似文献   

16.
The development of new hybrid diversity methods has been demanded by the continuous growth of information rates. These schemes achieve improved balance between hardware complexity and performance, in relation to conventional ones. In this paper, a new hybrid switched diversity system is proposed. It is called generalized multi‐branch switch and examine combining. The new scheme is a generalization of the classic one known as switch and examine combining with considerably improved performance. The performance of this scheme is analyzed over independent and identically distributed Nakagami‐m fading channels using well‐known performance and complexity measures. More specifically, important performance metrics, such as the outage probability, the average bit error rate, and the capacity under different adaptive transmission policies, have been analytically studied. It is shown that the proposed approach offers an important improvement on the performance, without considerably affecting the complexity.  相似文献   

17.
基于FFT的两种伪码快速捕获方案的研究与实现   总被引:6,自引:0,他引:6  
该文提出两种基于FFT的伪码快速捕获方案,一种是基于分数倍采样率转换器的快捕方案;另一种是基于抽取器的快捕方案。两种伪码快捕电路均利用设计复用技术使硬件规模大幅减少;采用并行设计使系统的运算速度大大提高;采用块浮点算法以提高动态范围和运算精度。两种快捕电路均由一块FPGA实现。仿真和测试结果表明,基于分数倍采样率转换器的快捕电路与基于抽取器的快捕电路相比,占用的硬件资源较大,但是捕获精度更高。  相似文献   

18.
An architecture based on a systolic array for real-time image template matching is presented. The architecture consists mainly of four elements: a digitizer, a two-dimensional systolic array combined with variable-length shift register arrays, an adder tree, and a comparator. All the elements form a four-stage pipeline. The image data enter the pipe sequentially in the same order as the TV raster scan. The matching computation is, however, performed in a parallel manner. The analyses on time complexity and hardware complexity have shown that real-time performance is achieved. The analyses have also shown that the processing speed is higher and the hardware is simpler when compared to the architecture presented by Chou and Chen.  相似文献   

19.
A new mechanism, the BHT cache, that makes self history-based branch predictors exploit branch correlation information is presented. The simulation results show that self history-based branch predictors incorporating the BHT cache improve prediction accuracy at a very small extra hardware cost  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号