首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 609 毫秒
1.
为了提高非一致内存访问(NUMA)架构虚拟机解释器的访存性能,研究了解释器在NUMA架构下的访存优化技术,提出了一种NUMA架构下的解释器访存优化方案,而且设计并实现了解释器的静态指令分派优化方法和动态指令分派优化方法。根据这一方案虚拟机启动时首先获取NUMA节点信息,并在每个NUMA节点中自动生成解释器所需的全部数据结构;解释器在运行时,通过静态或动态的指令分派技术来实现其执行线程在NUMA节点上访存的局部化。试验结果表明,上述方法能够显著提升解释器在NUMA系统中的性能。在DaCapo测试集上的总体性能提升了8%,最高性能提升幅度高达23%,而且算法实现代价低,适用于绝大多数的NUMA服务器系统。  相似文献   

2.
分析了处理器访存操作的指针追逐模式,指出了链式数据应用中的指针追逐操作的数据预取准确率低、访存延迟大的问题。为了提升处理器指针追逐访存性能,提出了指令标签辅助的数据预取(ILAMP)技术。ILAMP技术是一种指令标签提示的预取机制,其通过在指令集架构中添加新的访存指令,使该指令在处理器译码阶段产生特殊访存标签,指明该访存操作的加载内容是指针。在Cache缺失的情况下,该标签一直传递到内存控制器。当加载的指针返回内存控制器时,则提取指针、发出预取请求。实验结果表明,ILAMP技术与无ILAMP情况相比,ILAMP技术降低DRAM读请求的平均访问延迟的平均值约为15%,预取精度高于77%,访存带宽增加10%左右,硬件开销约为1k B。  相似文献   

3.
针对多核处理器上并行程序执行不确定性所造成的并行调试难问题,提出了一种基于硬件的快速确定性重放方法——时间切割者。该方法采用面向并行的记录机制来区分出原执行中并行执行的访存指令块和非并行执行的指令块,并在重放执行中避免串行执行那些在原执行中并行执行的访存指令块,从而使得重放执行的性能开销小。在多核模拟器Sim-Godson上的仿真实验结果表明:该方法的重放速度快,其性能开销仅为2%左右。此外,该方法还具有硬件支持简单特点,未来有望应用于国产多核处理器研制中。  相似文献   

4.
多表连接操作难以实现硬件加速。一方面,多表连接请求中表的数目不确定且连接方式多变,这种灵活的计算请求与固定的硬件行为之间存在矛盾;另一方面,多表连接的中间结果随表的增加而扩充,数据结构的管理和维护也要求更高的硬件开销。为支持灵活高效的多表连接计算,本文提出一种软硬件协同的优化方法。软件部分,将多表连接抽象为正向和反向2种计算模式并支持不同方式的多表连接。硬件设计采用访存和计算协同优化的方法:设计一种规则的硬件哈希表结构以提高内存访存带宽;设计支持正反向计算的同构专用计算引擎,配置多数据通道和指令控制系统实现高效的并行运算,提升多表哈希连接的计算效率。实验结果表明,相比中央处理器(CPU)执行表连接操作,单计算引擎能够提升性能9.2~11.0倍。通过多路并行的技术,实现8路并行的多表哈希引擎,能够充分利用板卡片外(DDR)内存带宽,实现相比CPU超过71.1倍的性能提升。  相似文献   

5.
针对控制转移开销是影响二进制翻译和优化系统性能的主要因素,进行了提高二进制翻译优化系统性能的研究,提出并实现了硬件设计开销较小的基于硬件内容可寻址存储器(CAM)机制的软硬件协同设计方法.通过实验充分分析了CAM大小、软件替换算法对CAM命中率的影响,并根据分析提出了一种新颖的、软硬件结合的降低CAM访问缺失率的方法.该方法相对于传统的软件和硬件优化方法,硬件实现及验证复杂度低且优化效果明显.实验结果表明该方法使得二进制翻译系统整体性能提高了13.44%.该方法已实际应用于龙芯x86二进制翻译系统中.  相似文献   

6.
嵌入式图形处理器(GPU)随着访存数据量越来越大,访存子系统在性能、面积及功耗等方面的瓶颈已经日益凸显。针对图形处理器的数据特点及访存需求,考虑到嵌入式图形处理器面积及功耗的约束,结合Godson GPU架构平台,提出了一种面向嵌入式图形处理器的访存子系统结构设计。该设计主要针对图形处理流水线的访存特点,对cache的结构进行了优化,并提出了一种基于链表方式的结构,提高了访存的效率,减少了面积且降低了功耗。为了使访存子系统适配并行图形流水线,提出了一种屏幕分区方法,可以在消除cache的一致性问题的同时,使访存子系统的负载更加均衡。该设计为嵌入式图形处理器的访存子系统设计提供了借鉴。  相似文献   

7.
研究了虚拟机退出及恢复运行时的开销问题,提出了一种用于降低虚拟机切换时进行保存及恢复现场的开销的延迟存储方法。该方法的主要思想是利用修改虚拟机软件源代码的方式,通过判断虚拟机恢复运行时是否依然是上次退出时的同一个虚拟机,来减少需要保存和恢复的寄存器数量。这个方法不需要对硬件设计进行改动,可以支持多核操作系统和多个虚拟机同时运行的情况,因此具有广泛的适用性。在龙芯3A1500处理器平台上的试验结果显示,上述延迟存储方法与现有方法相比,可以降低虚拟机退出开销65%,虚拟机整体性能提升3%到10%。  相似文献   

8.
对一种基于锁的Cache一致性协议的开销进行了评估和分析,并结合一款处理器接口的设计,实现了一种分布式内存结构上的远程内存共享机制.该机制能提高处理器机间的通讯性能,降低一致性协议中的通讯延迟,同时通过硬件锁对协议中的同步开销进行优化,避免锁管理器节点陷入处理程序而减少同步等待时间.实际测试结果表明,通过硬件优化的软件Cache一致性协议基本操作的性能得到极大的提高,并在实际应用上具有更好的加速比和可扩展性.  相似文献   

9.
近年来,随着深度神经网络在各领域的广泛应用,针对不同的应用场景,都需要对神经网络模型进行训练以获得更优的参数,于是对训练速度的需求不断提升。然而,现有的研究通常只关注了计算密集型层的加速,忽略了访存密集型层的加速。访存密集型层的操作主要由访存带宽决定执行效率,单独提升运算速度对性能影响不大。本文从执行顺序的角度出发,提出了将访存密集型层与其前后的计算密集型层融合为一个新层执行的方式,将访存密集型层的操作作为对融合新层中输入数据的前处理或输出数据的后处理进行,大幅减少了访存密集型层在训练过程中对片外内存的访问,提升了性能;并针对该融合执行方案,设计实现了一个面向训练的加速器,采用了暂存前处理结果、后处理操作与计算密集型层操作并行执行的优化策略,进一步提升了融合新层的训练性能。实验结果显示,在面积增加6.4%、功耗增加10.3%的开销下,训练的前向阶段、反向阶段的性能分别实现了67.7%、77.6%的提升。  相似文献   

10.
本文提出了一种多核处理器自适应I/O直接缓存访问(ADCA)的方法以提升I/O访存的性能,降低对其他程序的影响。与传统直接缓存访问(DCA)不同的是,该方法利用了LRU栈特性,通过采样辅助标签目录的方式动态调整DCA可使用的cache空间,同时对I/O数据的替换和写内存策略进行优化。实验结果表明,与DCA方式相比,该方式使得I/O带宽提升了大约10%,而与SPEC和采用直接内存访问(DMA)方式的网络测试程序同时运行相比,SPEC定点和浮点性能分别提升了11. 5%和8. 9%。  相似文献   

11.
A differential evolution approach to solve optimal power flow problem with multiple and competing objectives is presented. Two sub-problems of optimal power flow namely active power dispatch and reactive power dispatch are considered. The problem is formulated as a nonlinear constrained true multi-objective optimisation problem with competing objectives. Constrain-domination approach have been used to handle inequality constraints, which eliminates the use of penalty factors. The performance of the proposed approach was tested on standard IEEE 30-bus system and is compared with a conventional method. The result demonstrates the capability of the proposed approach to generate diverse and well-distributed Pareto-optimal solutions.  相似文献   

12.
Constraint handling is an important aspect of evolutionary constrained optimization. Currently, the mechanism used for constraint handling with evolutionary algorithms mainly assists the selection process, but not the actual search process. In this article, first a genetic algorithm is combined with a class of search methods, known as constraint consensus methods, that assist infeasible individuals to move towards the feasible region. This approach is also integrated with a memetic algorithm. The proposed algorithm is tested and analysed by solving two sets of standard benchmark problems, and the results are compared with other state-of-the-art algorithms. The comparisons show that the proposed algorithm outperforms other similar algorithms. The algorithm has also been applied to solve a practical economic load dispatch problem, where it also shows superior performance over other algorithms.  相似文献   

13.
A novel approach to achieve concurrent error detection in finite-field multiplication over GF(2m) that uses multiple-bit interlaced parity codes is presented. These codes are implemented as a generic parity checker, which means they can be used with any multiplier architecture. Relative to the number of parity bits used, much improved delay and error-detection performance are achieved compared to previously reported results, yet for the examples considered the area overhead did not exceed 12%. The proposed work is particularly important for cryptography implementations employing GF(2m) multipliers and requiring reliability and protection against adversarial attacks that use fault induction.  相似文献   

14.
Customer involvement with design activity is one of the principal components of mass customisation. Whereas many studies proposed methods to enable customer co-design, more research needs to determine co-design predictors and its associations with operations improvements. This study tests relationships between proximity, co-design, and performance, and whether co-design mediates proximity-performance relationships. Following on recent technology and collaborative trends, the study uses a three-dimensional operationalisation of customer proximity that includes physical, virtual, and affinity proximity measures. Regression analyses of data from 698 manufacturers from metal-mechanic industries suggest that virtual and affinity proximity related positively with customer co-design, that co-design explained quality and delivery improvements, and that co-design mediated the relationship between virtual proximity and quality improvements.  相似文献   

15.
Security is one of the major challenges that devices connected to the Internet of Things (IoT) face today. Remote attestation is used to measure these devices’ trustworthiness on the network by measuring the device platform’s integrity. Several software-based attestation mechanisms have been proposed, but none of them can detect runtime attacks. Although some researchers have attempted to tackle these attacks, the proposed techniques require additional secured hardware parts to be integrated with the attested devices to achieve their aim. These solutions are expensive and not suitable in many cases. This paper proposes a dual attestation process, SAPEM, with two phases: static and dynamic. The static attestation phase examines the program memory of the attested device. The dynamic program flow attestation examines the execution correctness of the application code. It can detect code injection and runtime attacks that hijack the control-flow, including data attacks that affect the program control-flow. The main aim is to minimize attestation overhead while maintaining our ability to detect the specified attacks. We validated SAPEM by implementing it on Raspberry Pi using its TrustZone extension. We attested it against the specified attacks and compared its performance with the related work in the literature. The results show that SAPEM significantly minimizes performance overhead while reliably detecting runtime attacks at the binary level.  相似文献   

16.
时迪 《包装工程》2019,40(16):201-204
目的 基于对协同设计中设计师与非设计师之间沟通的二元式分析,进一步构建多元式的沟通方法模型,以期有助于解决协同设计中的沟通难题,推动协同设计实践与研究的发展,并为协同创新在方法上提供一定的参考。方法 首先,对协同设计沟通研究中存在的问题进行归纳;其次,基于之前对协同设计中二元式沟通方法研究的结论,构建协同设计中多元式沟通方法模型;最后,对所提出的模型进行实践和对比分析。结论 经研究发现,所构建的“单次单方案式”和“单次多方案式”两种沟通模型,都有助于解决在协同设计中的沟通问题,并且模型化媒介可以作为一种信息收集工具,为后续的回顾和分析提供支持。  相似文献   

17.
基于拉格朗日方程建立了桥式天车机械系统的动力学模型,并对建立的动力学模型进行简化分析,为控制器的设计奠定了理论基础。首先,针对桥式天车机械系统建模时存在的耦合性问题,构建新型饱和函数;其次,基于构建的新型饱和函数,设计出解耦滑模控制器,用来实现负载运送过程中桥式天车的快速定位与负载的消摆;然后,引入自适应参数,用来削弱了解耦滑模控制器控制过程中由于开关增益造成的系统抖振问题;最后基于桥式天车机械系统的动力学模型进行模拟仿真。仿真实验结果表明基于自适应参数而设计的解耦滑模控制器具有良好的控制性能,并且能够提高机械系统的动态特性。  相似文献   

18.
This brief presents an efficient binary common subexpression elimination (BCSE)-based approach for designing reconfigurable interpolation root-raised cosine (RRC) finite-impulse-response (FIR) filter, whose coefficients change during runtime for multistandard wireless communication system called software-defined radio (SDR). Reconfiguration can be done conveniently by storing the coded coefficients in the lookup tables (LUTs), and loading the required coefficient set over the interpolation filter. In the proposed method based on 4-bit BCSE algorithm, first the number of binary common subexpressions (BCSs) formed in the coefficients is reduced. Hence, multiplexers, shifters, and adders in the multiplier structure are reduced, which results in the improvement of operating frequency. The number of addition operations is further reduced using programmable adders and an efficient polyphase interpolation structure is implemented to reduce the hardware cost. The proposed design has 49.5% less area-delay product and 28.6% improved frequency of operation when compared to a 2-bit BCSE-based technique reported earlier when implemented on Xilinx field-programmable gate array (FPGA) device XC2VP4FF672-6. Similarly, the proposed design supports 93.14 MHz operating frequency, which is 59.2% and 74.2% greater when compared to 2-bit BCSE- and 3-bit BCSE-based approach when implemented on XC2V3000FF1152-4. The proposed structure also shows improved performance in terms of speed and area when compared to distributed arithmetic (DA)-based and multiply-accumulate (MAC)-based approaches.  相似文献   

19.
Economic load dispatch is one of the vital purposes in electrical power system operation, management and planning. Economic dispatch problem is one of the most important problems in electric power system operation. In large scale system, the problem is more complex and difficult to find out optimal solution because it is nonlinear function and it contains number of local optimal. Combined economic emission dispatch (CEED) problem is to schedule the committed generating units outputs to meet the required load demand at minimum operating cost with minimum emission simultaneously. The main aim of economic load dispatch is to reduce the total production cost of the generating system and at the same time the necessary equality and inequality constraints should also be fulfilled. This leads to the development of CEED techniques. There are various techniques proposed by several researchers to solve CEED problem based on optimization techniques. But still some problems such as slower convergence and higher computational complexity exist in using the optimization techniques such as GA for solving CEED problem. This paper proposes an efficient and reliable technique for combined fuel cost economic optimization and emission dispatch using the Modified Ant Colony Optimization algorithm (MACO) to produce better optimal solution. The simulation results reveal the significant performance of the proposed MACO approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号